optimization process
Training Deep Networks without Learning Rates Through Coin Betting
Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms.
Multi-fidelity approaches for general constrained Bayesian optimization with application to aircraft design
Cordelier, Oihan, Diouane, Youssef, Bartoli, Nathalie, Laurendeau, Eric
Aircraft design relies heavily on solving challenging and computationally expensive Multidisciplinary Design Optimization problems. In this context, there has been growing interest in multi-fidelity models for Bayesian optimization to improve the MDO process by balancing computational cost and accuracy through the combination of high- and low-fidelity simulation models, enabling efficient exploration of the design process at a minimal computational effort. In the existing literature, fidelity selection focuses only on the objective function to decide how to integrate multiple fidelity levels, balancing precision and computational cost using variance reduction criteria. In this work, we propose novel multi-fidelity selection strategies. Specifically, we demonstrate how incorporating information from both the objective and the constraints can further reduce computational costs without compromising the optimality of the solution. We validate the proposed multi-fidelity optimization strategy by applying it to four analytical test cases, showcasing its effectiveness. The proposed method is used to efficiently solve a challenging aircraft wing aero-structural design problem. The proposed setting uses a linear vortex lattice method and a finite element method for the aerodynamic and structural analysis respectively. We show that employing our proposed multi-fidelity approach leads to $86\%$ to $200\%$ more constraint compliant solutions given a limited budget compared to the state-of-the-art approach.
- North America > United States > Virginia (0.04)
- North America > Canada (0.04)
- Europe > Germany (0.04)
- (2 more...)
- North America > United States > Iowa > Story County > Ames (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- North America > United States > Massachusetts (0.40)
- North America > Canada > Alberta (0.14)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- (2 more...)
- North America > United States > Michigan (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > Alameda County > Berkeley (0.14)
- Asia > Middle East > Jordan (0.04)
- (4 more...)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data
Artificial intelligence has advanced rapidly through large neural networks trained on massive datasets using thousands of GPUs or TPUs. Such training can occupy entire data centers for weeks and requires enormous computational and energy resources. Yet the optimization algorithms behind these runs have not kept pace. Most large scale training still relies on synchronous methods, where workers must wait for the slowest device, wasting compute and amplifying the effects of hardware and network variability. Removing synchronization seems like a simple fix, but asynchrony introduces staleness, meaning updates computed on outdated models. This makes analysis difficult, especially when delays arise from system level randomness rather than algorithmic choices. As a result, the time complexity of asynchronous methods remains poorly understood. This dissertation develops a rigorous framework for asynchronous first order stochastic optimization, focusing on the core challenge of heterogeneous worker speeds. Within this framework, we show that with proper design, asynchronous SGD can achieve optimal time complexity, matching guarantees previously known only for synchronous methods. Our first contribution, Ringmaster ASGD, attains optimal time complexity in the homogeneous data setting by selectively discarding stale updates. The second, Ringleader ASGD, extends optimality to heterogeneous data, common in federated learning, using a structured gradient table mechanism. Finally, ATA improves resource efficiency by learning worker compute time distributions and allocating tasks adaptively, achieving near optimal wall clock time with less computation. Together, these results establish asynchronous optimization as a theoretically sound and practically efficient foundation for distributed learning, showing that coordination without synchronization can be both feasible and optimal.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (5 more...)
- Energy (1.00)
- Education (1.00)
- Health & Medicine > Therapeutic Area (0.45)
- Information Technology > Services (0.34)
Batched Energy-Entropy acquisition for Bayesian Optimization
Bayesian optimization (BO) is an attractive machine learning framework for performing sample-efficient global optimization of black-box functions. The optimization process is guided by an acquisition function that selects points to acquire in each round of BO. In batched BO, when multiple points are acquired in parallel, commonly used acquisition functions are often high-dimensional and intractable, leading to the use of sampling-based alternatives. We propose a statistical physics inspired acquisition function that can natively handle batches. Batched Energy-Entropy acquisition for BO (BEEBO) enables tight control of the explore-exploit trade-off of the optimization process and generalizes to heteroskedastic black-box problems. We demonstrate the applicability of BEEBO on a range of problems, showing competitive performance to existing acquisition functions.